我们提出了一个新颖的框架,按需运动产生(ODMO),用于生成现实和多样化的长期3D人体运动序列,该序列仅以具有额外的自定义能力的动作类型为条件。 ODMO在三个公共数据集(HumanAct12,UESTC和MOCAP)上进行评估时,对所有传统运动评估指标的SOTA方法显示了改进。此外,我们提供定性评估和定量指标,这些指标证明了我们框架提供的几种首要的自定义功能,包括模式发现,插值和轨迹自定义。这些功能大大扩大了此类运动产生模型的潜在应用的范围。编码器和解码器体系结构中的创新启用了新颖的按需生成能力:(i)编码器:在低维的潜在空间中利用对比度学习来创建运动序列的层次结构嵌入,不仅是不同动作的代码,类型形成不同的组,但在动作类型中,类似的固有模式(运动样式)聚集在一起的代码,使它们容易发现; (ii)解码器:使用层次解码策略,该策略首先重建运动轨迹,然后用于重建整个运动序列。这样的架构可以有效地控制轨迹控制。我们的代码发布在GitHub页面:https://github.com/roychowdhuryresearch/odmo
translated by 谷歌翻译
以任务为导向的通信,主要是使用基于学习的联合源通道编码(JSCC),旨在通过将与任务相关的信息传输到接收方来设计通信有效的边缘推理系统。但是,只有在不引入任何冗余的情况下传输与任务相关的信息可能会导致由于渠道变化引起的学习鲁棒性问题,而JSCC将源数据直接映射到连续的通道输入符号中会对现有数字通信系统提出兼容性问题。在本文中,我们通过首先调查编码表示形式的信息性与接收到的信息失真的鲁棒性之间的固有权衡解决这两个问题,然后提出一种具有任务调制的导向的通信方案,名为Inveete Task-定向的JSCC(DT-JSCC),其中发射器将功能编码为离散表示形式,并使用数字调制方案将其传输到接收器。在DT-JSCC方案中,我们开发了一个可靠的编码框架,称为强大的信息瓶颈(rib),以改善对信道变化的稳健性,并使用变量近似来得出肋骨目标的可拖动变异上限,以克服克服相互信息的计算棘手性。实验结果表明,所提出的DT-JSCC比具有低通信延迟的基线方法更好的推理性能更好,并且由于施加的肋骨框架而表现出对通道变化的鲁棒性。
translated by 谷歌翻译
电价是影响所有市场参与者决策的关键因素。准确的电价预测非常重要,并且由于各种因素,电价高度挥发性,电价也非常具有挑战性。本文提出了一项综合的长期经常性卷积网络(ILRCN)模型,以预测考虑到市场价格的大多数贡献属性的电力价格。所提出的ILRCN模型将卷积神经网络和长短期记忆(LSTM)算法的功能与所提出的新颖的条件纠错项相结合。组合的ILRCN模型可以识别输入数据内的线性和非线性行为。我们使用鄂尔顿批发市场价格数据以及负载型材,温度和其他因素来说明所提出的模型。使用平均绝对误差和准确性等性能/评估度量来验证所提出的ILRCN电价预测模型的性能。案例研究表明,与支持向量机(SVM)模型,完全连接的神经网络模型,LSTM模型和LRCN模型,所提出的ILRCN模型在电价预测中是准确和有效的电力价格预测。
translated by 谷歌翻译
图形神经网络(GNNS)已经变得越来越流行,并且在许多基于图形的应用程序中实现了令人印象深刻的结果。但是,需要广泛的手动工作和域知识来设计有效的架构,GNN模型的结果具有高差异,与不同的培训设置相比,限制了现有GNN模型的应用。在本文中,我们展示了AutoHensgnn,这是一个框架,用于为图表任务构建有效和强大的模型而没有任何人为干预。 Autohensgnn在kdd杯2020年签名挑战中赢得了第一名,并在最终阶段实现了五个现实生活数据集的最佳等级分数。鉴于任务,AutoHensgnn首先应用一个快速的代理评估,以自动选择有希望的GNN模型的池。然后它构建了一个分层合奏框架:1)我们提出图形自我合奏(GSE),这可以减少重量初始化的方差,有效利用本地和全球街区的信息; 2)基于GSE,使用不同类型的GNN模型的加权集合来有效地学习更多辨别节点表示。为了有效地搜索体系结构和合奏权重,我们提出了AutoHensgnn $ _ {\ text {梯度}} $,它将架构和集合权重视为架构参数,并使用基于梯度的架构搜索来获得最佳配置,而autohensgnn $ {autohensgnn $ { \文本{Adaptive}} $,可以根据模型精度自适应地调整集合重量。关于节点分类的广泛实验,图形分类,边缘预测和KDD杯挑战表明了Autohensgnn的有效性和一般性
translated by 谷歌翻译
深度卷积神经网络(CNNS)通常是复杂的设计,具有许多可学习的参数,用于准确性原因。为了缓解在移动设备上部署它们的昂贵成本,最近的作品使挖掘预定识别架构中的冗余作出了巨大努力。然而,尚未完全研究现代CNN的输入分辨率的冗余,即输入图像的分辨率是固定的。在本文中,我们观察到,用于准确预测给定图像的最小分辨率使用相同的神经网络是不同的。为此,我们提出了一种新颖的动态分辨率网络(DRNET),其中基于每个输入样本动态地确定输入分辨率。其中,利用所需网络共同地探索具有可忽略的计算成本的分辨率预测器。具体地,预测器学习可以保留的最小分辨率,并且甚至超过每个图像的原始识别准确性。在推断过程中,每个输入图像将被调整为其预测的分辨率,以最小化整体计算负担。然后,我们对几个基准网络和数据集进行了广泛的实验。结果表明,我们的DRNET可以嵌入到任何现成的网络架构中,以获得计算复杂性的相当大降低。例如,DR-RESET-50实现了类似的性能,计算减少约34%,同时增加了1.4%的准确度,与原始Resnet-50上的计算减少相比,在ImageNet上的原始resnet-50增加了10%。
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views. However, the preserved high-level semantics do not contain enough local information, which is vital in medical image analysis (e.g., image-based diagnosis and tumor segmentation). To mitigate the locality problem of comparative SSL, we propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding but has not drawn much attention in SSL. The resulting framework can be formulated as a multi-task optimization problem on the feature pyramid. Specifically, we conduct multi-scale pixel restoration and siamese feature comparison in the pyramid. In addition, we propose non-skip U-Net to build the feature pyramid and develop sub-crop to replace multi-crop in 3D medical imaging. The proposed unified SSL framework (PCRLv2) surpasses its self-supervised counterparts on various tasks, including brain tumor segmentation (BraTS 2018), chest pathology identification (ChestX-ray, CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation (LiTS), sometimes outperforming them by large margins with limited annotations.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译